A Systematic Comparison of Training Criteria for Statistical Machine Translation

نویسندگان

  • Richard Zens
  • Sasa Hasan
  • Hermann Ney
چکیده

We address the problem of training the free parameters of a statistical machine translation system. We show significant improvements over a state-of-the-art minimum error rate training baseline on a large ChineseEnglish translation task. We present novel training criteria based on maximum likelihood estimation and expected loss computation. Additionally, we compare the maximum a-posteriori decision rule and the minimum Bayes risk decision rule. We show that, not only from a theoretical point of view but also in terms of translation quality, the minimum Bayes risk decision rule is preferable.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimum Error Rate Training in Statistical Machine Translation

Often, the training procedure for statistical machine translation models is based on maximum likelihood or related criteria. A general problem of this approach is that there is only a loose relation to the final translation quality on unseen text. In this paper, we analyze various training criteria which directly optimize translation quality. These training criteria make use of recently propose...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Considerations in Maximum Mutual Information and Minimum Classi- fication Error training for Statistical Machine Translation

Discriminative training methods are used in statistical machine translation to effectively introduce and combine additional knowledge sources within the translation process. Although these methods are described in the accompanying literature and comparative studies are available for speech recognition, additional considerations are introduced when applying discriminative training to statistical...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Exploratory-cumulative vs. Disputational Talk on Cognitive Dependency of Translation Studies: Intermediate level students in focus

The present study set out to determine the effect of implementing exploratory-cumulative talk in comparison to disputational talk on cognitive (meaning development and organization of thought as well as problem solving ability) dependency of intermediate level students in translation studies. In order to achieve the objectives of the study, a quasi-experimental-pretest-posttest-statistical stud...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007